Movie Script Scraping Service

A Python-based web service that fetches movie scripts from various online sources including IMSDB and Cinematheque.fr.

Features

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/script-scraper.git
cd script-scraper
  1. Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -e .

Usage

Starting the Server

Run the server with:

python src/main.py

The server will start on http://localhost:8000. You can access the interactive API documentation at http://localhost:8000/docs.

API Endpoints

Search for a Script

GET /scripts/search?title={movie_title}

Example response:

{
  "title": "The Matrix",
  "url": "https://imsdb.com/scripts/Matrix.html",
  "source": "IMSDB"
}

Get a Script

GET /scripts/{movie_title}

Example response:

{
  "title": "The Matrix",
  "script": "INT. COMPUTER SCREEN\nText flowing in tight corridors...",
  "source": "IMSDB",
  "url": "https://imsdb.com/scripts/Matrix.html"
}

Analyze Bechdel Test

GET /scripts/{movie_title}/bechdel

Example response:

{
  "passes_test": true,
  "female_characters": [
    {
      "name": "Trinity",
      "gender": "female",
      "lines": [
        "The answer is out there, Neo.",
        "It's looking for you."
      ]
    }
  ],
  "conversations": [
    {
      "participants": ["Trinity", "Switch"],
      "dialogue": [
        "Is everything in place?",
        "Yes. They don't know we're monitoring."
      ],
      "about_men": false,
      "context": "Opening scene"
    }
  ],
  "failure_reasons": null
}

This endpoint analyzes a movie script using the Bechdel test criteria:

  1. At least two named female characters
  2. Who talk to each other
  3. About something other than a man

The response includes:

Error Responses

The API uses standard HTTP status codes:

Error response example:

{
  "error": "Script scraping failed",
  "details": "Connection timeout"
}

Development

Running Tests

Run the test suite with:

pytest

Project Structure

src/
  ├── api/
  │   ├── models.py      # Pydantic models
  │   └── server.py      # FastAPI server
  ├── core/
  │   └── scrapers/
  │       ├── base.py       # Base scraper interface
  │       ├── imsdb.py      # IMSDB implementation
  │       └── cinematheque.py # Cinematheque implementation
  └── main.py           # Entry point
tests/
  ├── test_api.py      # API tests
  └── test_scrapers.py # Scraper tests

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Write and test your changes
  4. Commit your changes (git commit -m 'feat: add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.